Indexing `DataFrame`

In pandas, both Series and DataFrame objects can have indices applied to them. An index serves as a row-level label, corresponding to axis zero. Indices can be autogenerated or explicitly set. This guide covers various methods for handling indices in pandas, including setting, resetting, and using multi-level indices.

Setting Index

The set_index() function is used to set one or more columns of a DataFrame as its index. Note that this function is destructive; it doesn't keep the current index. To preserve the current index, manually copy it to a new column before setting a new index.

Example: Setting Index

import pandas as pd

# Importing the dataset
df = pd.read_csv("datasets/Admission_Predict.csv", index_col=0)
df.head()

# Preserve the serial number into a new column
df['Serial Number'] = df.index

# Set the index to 'Chance of Admit'
df = df.set_index('Chance of Admit ')
df.head()

Resetting Index

The reset_index() function converts the index back into a column and creates a default numbered index.

df = df.reset_index()
df.head()

Multi-Level Indexing

Pandas supports multi-level indexing, similar to composite keys in relational databases. This feature allows you to create hierarchical indices using multiple columns.

Example: Multi-Level Indexing with Census Data

# Importing census data
df = pd.read_csv('datasets/census.csv')
df.head()

# Filtering to keep only county-level data
df = df[df['SUMLEV'] == 50]

# Reducing columns for simplicity
columns_to_keep = ['STNAME', 'CTYNAME', 'BIRTHS2010', 'BIRTHS2011', 'BIRTHS2012', 'BIRTHS2013',
                   'BIRTHS2014', 'BIRTHS2015', 'POPESTIMATE2010', 'POPESTIMATE2011',
                   'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014', 'POPESTIMATE2015']
df = df[columns_to_keep]
df.head()

# Setting a multi-level index
df = df.set_index(['STNAME', 'CTYNAME'])
df.head()

Querying with Multi-Level Index

When using a multi-level index, the loc attribute can take multiple arguments in order by the level you wish to query.

# Querying data for Washtenaw County, Michigan
df.loc['Michigan', 'Washtenaw County']

# Comparing two counties: Washtenaw and Wayne County
df.loc[ [('Michigan', 'Washtenaw County'), ('Michigan', 'Wayne County')] ]

Hierarchical Labeling

Hierarchical indexing isn't limited to rows; it can also be applied to columns. This allows for complex data manipulation and is particularly useful for viewing data in a tabular form.

Transposing Data with Hierarchical Column Labels

By transposing a DataFrame, hierarchical column labels can be used effectively.

df.T  # Transposing the DataFrame

Setting Index​

Example: Setting Index​

Resetting Index​

Multi-Level Indexing​

Example: Multi-Level Indexing with Census Data​

Querying with Multi-Level Index​

Hierarchical Labeling​

Transposing Data with Hierarchical Column Labels​